On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps.
The majority of the predictors included in the following models are at a daily temporal resolution. However, for the DO and AGI models, we also investigated the inclusion of these two predictors at seasonal and annual temporal resolutions. The remaining environmental predictors are also available at these resolutions, and can included in follow-up models.
The pseudo absences used in these models were generated using correlated random walk approaches, but another quarto document includes models with background sampling pseudo absences. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.
The hypotheses I would like to test with these models are as follows:
H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.
study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?
H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.
study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?
H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.
study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?
H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.
study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?
H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.
study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?
Base models
These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.
base_Nspat_Ntag <- readRDS (brt_outputs[7 ])
# model performance
ggBRT:: ggPerformance (base_Nspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5736928
Correlation 0.8285057
AUC 0.9664000
Per.Expl 58.6164539
cvDeviance 0.8622091
cvCorrelation 0.6620174
cvAUC 0.8766500
cvPer.Expl 37.8042183
#relative influence of predictors
ggBRT:: ggInfluence (base_Nspat_Ntag)
rel.inf
bathy_mean 24.425351
temp_mean 18.087655
sal_mean 10.919911
chl_mean 9.336627
vostr_mean 6.686538
ssh_mean 6.168030
mld_mean 6.048063
bathy_sd 5.587410
uo_mean 3.417308
vo_mean 3.364026
uostr_mean 3.297190
pred_var 2.661890
#explore partial plots
gbm.plot (base_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Nspat_Ntag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 952.17
2 3 sal_mean 2 temp_mean 710.45
3 8 ssh_mean 2 temp_mean 512.92
4 6 vo_mean 3 sal_mean 417.59
5 10 bathy_mean 3 sal_mean 368.97
6 2 temp_mean 1 chl_mean 365.72
7 10 bathy_mean 8 ssh_mean 278.76
#predictive performance using test dataset
preds <- predict.gbm (base_Nspat_Ntag, dat_test_base,
n.trees = base_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Nspat_Ntag)
#eval 75/25
eval_7525_modified (base_Nspat_Ntag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3977416 0.6213903 0.8517752 1.136004 0.3067274 0.5861645
base_Nspat_Ytag <- readRDS (brt_outputs[8 ])
# model performance
ggBRT:: ggPerformance (base_Nspat_Ytag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.2305879
Correlation 0.9503452
AUC 0.9979000
Per.Expl 83.3664540
cvDeviance 0.4538959
cvCorrelation 0.8528183
cvAUC 0.9680300
cvPer.Expl 67.2580483
#relative influence of predictors
ggBRT:: ggInfluence (base_Nspat_Ytag)
rel.inf
tag 44.1894066
bathy_mean 17.5965698
temp_mean 11.0672136
sal_mean 6.8107330
chl_mean 4.6665128
ssh_mean 3.6780850
vostr_mean 3.5179667
mld_mean 2.5624157
bathy_sd 1.8533773
uostr_mean 1.2374256
vo_mean 1.1506546
uo_mean 1.0981552
pred_var 0.5714842
#explore partial plots
gbm.plot (base_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Nspat_Ytag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1496.60
2 3 temp_mean 1 tag 1036.49
3 11 bathy_mean 1 tag 988.07
4 2 chl_mean 1 tag 512.43
5 9 ssh_mean 1 tag 369.40
6 11 bathy_mean 3 temp_mean 274.54
7 8 vostr_mean 1 tag 231.62
8 10 mld_mean 1 tag 207.08
#predictive performance using test dataset
preds <- predict.gbm (base_Nspat_Ytag, dat_test_base,
n.trees = base_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Nspat_Ytag)
#eval 75/25
eval_7525_modified (base_Nspat_Ytag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3642804 0.7198741 0.8981109 1.210241 0.3560819 0.8336645
base_Yspat_Ytag <- readRDS (brt_outputs[9 ])
# model performance
ggBRT:: ggPerformance (base_Yspat_Ytag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.1773819
Correlation 0.9648516
AUC 0.9992000
Per.Expl 87.2044910
cvDeviance 0.3814392
cvCorrelation 0.8803459
cvAUC 0.9770700
cvPer.Expl 72.4847405
#relative influence of predictors
ggBRT:: ggInfluence (base_Yspat_Ytag)
rel.inf
tag 40.2206057
dist_coast 24.0608862
lat 9.4741336
temp_mean 4.9067106
bathy_mean 4.6525961
sal_mean 4.4004394
chl_mean 3.5004953
vostr_mean 2.4769759
ssh_mean 1.7393792
mld_mean 1.7041726
bathy_sd 0.8436843
uo_mean 0.5521552
vo_mean 0.5356691
uostr_mean 0.5120463
pred_var 0.4200505
#explore partial plots
gbm.plot (base_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Yspat_Ytag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 883.10
2 5 sal_mean 1 tag 512.27
3 4 temp_mean 1 tag 484.21
4 3 chl_mean 1 tag 478.25
5 14 dist_coast 1 tag 452.76
6 12 bathy_mean 1 tag 425.15
7 11 mld_mean 1 tag 201.34
8 14 dist_coast 2 lat 160.15
9 14 dist_coast 4 temp_mean 153.48
10 10 ssh_mean 1 tag 149.67
11 15 pred_var 1 tag 139.75
#predictive performance using test dataset
preds <- predict.gbm (base_Yspat_Ytag, dat_test_base,
n.trees = base_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Yspat_Ytag)
#eval 75/25
eval_7525_modified (base_Yspat_Ytag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3604686 0.7321882 0.9049214 1.219883 0.354562 0.8720449
DO models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
do_0m_Nspat_Ytag <- readRDS (brt_outputs[14 ])
# model performance
ggBRT:: ggPerformance (do_0m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3408774
Correlation 0.9058550
AUC 0.9897000
Per.Expl 75.4108818
cvDeviance 0.5312851
cvCorrelation 0.8146489
cvAUC 0.9559100
cvPer.Expl 61.6758566
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Nspat_Ytag)
rel.inf
tag 40.9080062
bathy_mean 23.7019091
o2_mean_0m 14.1885697
temp_mean 4.2249870
chl_mean 3.4047577
sal_mean 2.9679120
ssh_mean 2.7533054
vostr_mean 1.6135297
mld_mean 1.4718496
bathy_sd 1.0656363
pred_var 1.0330993
uostr_mean 0.9371363
vo_mean 0.9056197
uo_mean 0.8236821
#explore partial plots
gbm.plot (do_0m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 1100.34
2 2 o2_mean_0m 1 tag 577.24
3 5 sal_mean 1 tag 519.34
4 4 temp_mean 1 tag 249.16
5 10 ssh_mean 1 tag 212.28
6 12 bathy_mean 4 temp_mean 154.44
7 3 chl_mean 1 tag 153.94
8 13 bathy_sd 1 tag 126.85
9 11 mld_mean 1 tag 117.52
10 14 pred_var 1 tag 104.95
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3833426 0.6762378 0.8716359 1.194065 0.2956827 0.7541088
do_0m_Yspat_Ytag <- readRDS (brt_outputs[15 ])
# model performance
ggBRT:: ggPerformance (do_0m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3169398
Correlation 0.9131868
AUC 0.9912000
Per.Expl 77.1376144
cvDeviance 0.4910827
cvCorrelation 0.8306125
cvAUC 0.9627200
cvPer.Expl 64.5758521
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Yspat_Ytag)
rel.inf
tag 39.3243042
dist_coast 23.4675901
o2_mean_0m 9.2630160
bathy_mean 7.6602469
lat 7.4436658
temp_mean 2.1578941
chl_mean 2.1270429
sal_mean 2.0922701
ssh_mean 1.4504847
mld_mean 1.0484217
pred_var 0.9808561
vostr_mean 0.7765505
bathy_sd 0.6166353
uostr_mean 0.5730787
vo_mean 0.5270670
uo_mean 0.4908760
#explore partial plots
gbm.plot (do_0m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Yspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 537.19
2 13 bathy_mean 1 tag 441.38
3 3 o2_mean_0m 1 tag 378.08
4 15 dist_coast 1 tag 190.11
5 6 sal_mean 1 tag 188.58
6 4 chl_mean 1 tag 144.71
7 11 ssh_mean 1 tag 112.11
8 5 temp_mean 1 tag 96.88
9 16 pred_var 1 tag 90.22
10 14 bathy_sd 1 tag 83.50
11 3 o2_mean_0m 2 lat 75.98
12 12 mld_mean 1 tag 69.54
13 5 temp_mean 2 lat 62.79
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Yspat_Ytag, dat_test_do,
n.trees = do_0m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_Yspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3722069 0.697656 0.8838555 1.186513 0.3310462 0.7713761
do_0m_60m_Nspat_Ytag <- readRDS (brt_outputs[13 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3275756
Correlation 0.9101734
AUC 0.9904000
Per.Expl 76.3704047
cvDeviance 0.5211133
cvCorrelation 0.8185662
cvAUC 0.9575900
cvPer.Expl 62.4096016
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_Nspat_Ytag)
rel.inf
tag 39.7630992
bathy_mean 22.3644729
o2_mean_0m 13.1609254
o2_mean_60m 5.7538384
chl_mean 3.4719050
temp_mean 3.0814247
sal_mean 2.8796698
ssh_mean 2.4945921
mld_mean 1.4530477
vostr_mean 1.2564378
pred_var 1.0243164
bathy_sd 0.9583521
vo_mean 0.8437129
uostr_mean 0.8075706
uo_mean 0.6866347
#explore partial plots
gbm.plot (do_0m_60m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 1007.87
2 2 o2_mean_0m 1 tag 483.04
3 5 sal_mean 1 tag 376.66
4 14 o2_mean_60m 1 tag 244.01
5 4 temp_mean 1 tag 223.33
6 10 ssh_mean 1 tag 169.05
7 3 chl_mean 1 tag 158.22
8 13 bathy_sd 1 tag 133.89
9 12 bathy_mean 4 temp_mean 98.89
10 11 mld_mean 1 tag 89.75
11 7 uostr_mean 1 tag 85.56
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_60m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3825586 0.6793962 0.874085 1.197922 0.2981937 0.763704
do_0m_250m_Nspat_Ytag <- readRDS (brt_outputs[10 ])
# model performance
ggBRT:: ggPerformance (do_0m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3181841
Correlation 0.9124632
AUC 0.9911000
Per.Expl 77.0478579
cvDeviance 0.5109006
cvCorrelation 0.8216070
cvAUC 0.9592600
cvPer.Expl 63.1462878
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_250m_Nspat_Ytag)
rel.inf
tag 39.6887396
o2_mean_0m 16.8057119
o2_mean_250m 16.3627195
bathy_mean 11.6711360
temp_mean 2.7622574
chl_mean 2.2458410
sal_mean 2.1797729
ssh_mean 1.9088688
pred_var 1.1660577
mld_mean 1.0505143
bathy_sd 0.9624893
vostr_mean 0.9540213
uostr_mean 0.7842279
uo_mean 0.7318377
vo_mean 0.7258045
#explore partial plots
gbm.plot (do_0m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_250m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 624.45
2 2 o2_mean_0m 1 tag 515.76
3 14 o2_mean_250m 1 tag 451.56
4 5 sal_mean 1 tag 437.13
5 14 o2_mean_250m 2 o2_mean_0m 206.71
6 3 chl_mean 1 tag 190.92
7 10 ssh_mean 1 tag 169.35
8 4 temp_mean 1 tag 130.26
9 13 bathy_sd 1 tag 108.91
10 5 sal_mean 2 o2_mean_0m 103.02
11 15 pred_var 1 tag 95.95
#predictive performance using test dataset
preds <- predict.gbm (do_0m_250m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_250m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3824959 0.6804338 0.8739071 1.200324 0.2923615 0.7704786
do_0m_60m_250m_Nspat_Ytag <- readRDS (brt_outputs[11 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3191965
Correlation 0.9122045
AUC 0.9909000
Per.Expl 76.9748229
cvDeviance 0.5103819
cvCorrelation 0.8220382
cvAUC 0.9594100
cvPer.Expl 63.1837098
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Nspat_Ytag)
rel.inf
tag 38.7855718
o2_mean_0m 16.7326988
o2_mean_250m 15.8676740
bathy_mean 11.1462711
o2_mean_60m 3.6391761
temp_mean 2.3942027
sal_mean 2.1301516
chl_mean 2.0440835
ssh_mean 1.6371297
pred_var 1.0443052
mld_mean 0.9945159
vostr_mean 0.8817324
bathy_sd 0.8374595
uostr_mean 0.7158327
vo_mean 0.6096311
uo_mean 0.5395639
#explore partial plots
gbm.plot (do_0m_60m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 569.59
2 15 o2_mean_250m 1 tag 463.02
3 2 o2_mean_0m 1 tag 458.51
4 5 sal_mean 1 tag 328.66
5 4 temp_mean 1 tag 178.30
6 3 chl_mean 1 tag 175.45
7 14 o2_mean_60m 1 tag 168.87
8 10 ssh_mean 1 tag 147.64
9 15 o2_mean_250m 2 o2_mean_0m 141.39
10 13 bathy_sd 1 tag 101.16
11 16 pred_var 1 tag 73.16
12 11 mld_mean 1 tag 72.52
13 7 uostr_mean 1 tag 68.16
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_60m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3812725 0.6825025 0.8758667 1.198771 0.29994 0.7697482
do_0m_60m_250m_Yspat_Ytag <- readRDS (brt_outputs[12 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3011707
Correlation 0.9188137
AUC 0.9925000
Per.Expl 78.2751108
cvDeviance 0.4820971
cvCorrelation 0.8345785
cvAUC 0.9641300
cvPer.Expl 65.2240256
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Yspat_Ytag)
rel.inf
tag 37.5967831
dist_coast 20.6808825
o2_mean_0m 10.5992826
o2_mean_250m 8.5196739
lat 4.6145739
bathy_mean 3.8293017
o2_mean_60m 2.9614725
temp_mean 1.9990685
chl_mean 1.8128857
sal_mean 1.6521470
ssh_mean 1.2297337
pred_var 1.0165713
mld_mean 0.8872307
bathy_sd 0.6138532
uostr_mean 0.5819161
vostr_mean 0.4989877
vo_mean 0.4618753
uo_mean 0.4437606
#explore partial plots
gbm.plot (do_0m_60m_250m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Yspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 367.17
2 13 bathy_mean 1 tag 301.62
3 3 o2_mean_0m 1 tag 273.23
4 17 o2_mean_250m 1 tag 253.83
5 6 sal_mean 1 tag 176.41
6 4 chl_mean 1 tag 142.58
7 15 dist_coast 1 tag 137.30
8 11 ssh_mean 1 tag 106.92
9 16 o2_mean_60m 1 tag 100.63
10 5 temp_mean 1 tag 76.80
11 18 pred_var 1 tag 74.23
12 14 bathy_sd 1 tag 69.96
13 12 mld_mean 1 tag 67.22
14 9 vo_mean 1 tag 54.84
15 10 vostr_mean 1 tag 49.70
16 17 o2_mean_250m 3 o2_mean_0m 40.74
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Yspat_Ytag, dat_test_do,
n.trees = do_0m_60m_250m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Yspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3747511 0.6952711 0.8822997 1.196225 0.3168627 0.7827511
AGI models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
agi_0m_Nspat_Ytag <- readRDS (brt_outputs[5 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3294417
Correlation 0.9120135
AUC 0.9914000
Per.Expl 76.2356104
cvDeviance 0.5331603
cvCorrelation 0.8132824
cvAUC 0.9557600
cvPer.Expl 61.5402963
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Nspat_Ytag)
rel.inf
tag 41.4342777
bathy_mean 22.5768731
temp_mean 8.8146014
AGI_0m 5.3409758
sal_mean 4.3539170
ssh_mean 4.0682555
chl_mean 3.4388349
vostr_mean 2.4504783
uostr_mean 1.5708916
mld_mean 1.3780963
bathy_sd 1.3161888
vo_mean 1.2188877
pred_var 1.0875914
uo_mean 0.9501306
#explore partial plots
gbm.plot (agi_0m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 1 tag 803.70
2 4 sal_mean 1 tag 650.52
3 13 AGI_0m 3 temp_mean 565.85
4 3 temp_mean 1 tag 323.97
5 9 ssh_mean 1 tag 312.45
6 11 bathy_mean 3 temp_mean 257.21
7 13 AGI_0m 1 tag 254.47
8 12 bathy_sd 1 tag 207.39
9 2 chl_mean 1 tag 203.19
10 10 mld_mean 1 tag 153.81
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3894603 0.6689158 0.8672566 1.217187 0.2785315 0.7623561
agi_0m_Yspat_Ytag <- readRDS (brt_outputs[6 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3079406
Correlation 0.9177214
AUC 0.9925000
Per.Expl 77.7865966
cvDeviance 0.4916562
cvCorrelation 0.8299703
cvAUC 0.9625900
cvPer.Expl 64.5342064
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Yspat_Ytag)
rel.inf
tag 39.1704772
dist_coast 22.9067106
lat 9.2973311
bathy_mean 7.5281540
temp_mean 4.6964781
AGI_0m 4.1053374
sal_mean 2.6165565
chl_mean 2.4600045
ssh_mean 1.8019982
pred_var 1.1308360
mld_mean 0.9622477
vostr_mean 0.7571586
bathy_sd 0.7305617
uostr_mean 0.6250570
uo_mean 0.6156725
vo_mean 0.5954189
#explore partial plots
gbm.plot (agi_0m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Yspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 646.34
2 12 bathy_mean 1 tag 375.27
3 5 sal_mean 1 tag 260.79
4 3 chl_mean 1 tag 187.34
5 4 temp_mean 1 tag 176.90
6 15 dist_coast 1 tag 162.12
7 14 AGI_0m 1 tag 154.37
8 13 bathy_sd 1 tag 134.66
9 16 pred_var 1 tag 101.32
10 12 bathy_mean 4 temp_mean 99.58
11 10 ssh_mean 1 tag 92.42
12 11 mld_mean 1 tag 67.39
13 9 vostr_mean 1 tag 61.50
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Yspat_Ytag, dat_test_agi,
n.trees = agi_0m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_Yspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6350 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3785647 0.6906597 0.880208 1.211341 0.3127222 0.777866
agi_0m_60m_Nspat_Ytag <- readRDS (brt_outputs[4 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3298267
Correlation 0.9115959
AUC 0.9912000
Per.Expl 76.2078336
cvDeviance 0.5248204
cvCorrelation 0.8175566
cvAUC 0.9574200
cvPer.Expl 62.1418985
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_Nspat_Ytag)
rel.inf
tag 41.1130461
bathy_mean 22.1175206
temp_mean 8.8167602
AGI_0m 5.1923827
sal_mean 3.8146119
AGI_60m 3.5914584
ssh_mean 3.3646809
chl_mean 2.8805052
vostr_mean 2.2362877
mld_mean 1.3300028
uostr_mean 1.2641285
vo_mean 1.2078786
bathy_sd 1.1634177
pred_var 1.0421567
uo_mean 0.8651621
#explore partial plots
gbm.plot (agi_0m_60m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 1 tag 680.43
2 13 AGI_0m 3 temp_mean 677.26
3 4 sal_mean 1 tag 537.76
4 3 temp_mean 1 tag 412.50
5 14 AGI_60m 1 tag 247.42
6 13 AGI_0m 1 tag 229.24
7 9 ssh_mean 1 tag 222.42
8 11 bathy_mean 3 temp_mean 187.87
9 12 bathy_sd 1 tag 151.81
10 10 mld_mean 1 tag 151.59
11 2 chl_mean 1 tag 146.54
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3859565 0.6748523 0.8716447 1.212224 0.2957176 0.7620783
agi_0m_250m_Nspat_Ytag <- readRDS (brt_outputs[1 ])
# model performance
ggBRT:: ggPerformance (agi_0m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3250560
Correlation 0.9119144
AUC 0.9912000
Per.Expl 76.5519740
cvDeviance 0.5193933
cvCorrelation 0.8180693
cvAUC 0.9578900
cvPer.Expl 62.5333843
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_250m_Nspat_Ytag)
rel.inf
tag 40.4877590
AGI_250m 16.0481464
bathy_mean 11.8550705
temp_mean 11.3274832
AGI_0m 4.0294752
sal_mean 3.1191434
ssh_mean 2.8171750
chl_mean 2.4561101
vostr_mean 1.7373119
bathy_sd 1.1323632
mld_mean 1.0927830
uostr_mean 1.0841796
pred_var 1.0292176
vo_mean 1.0064972
uo_mean 0.7772848
#explore partial plots
gbm.plot (agi_0m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_250m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 628.92
2 11 bathy_mean 1 tag 513.82
3 14 AGI_250m 1 tag 395.22
4 3 temp_mean 1 tag 374.74
5 2 chl_mean 1 tag 242.62
6 9 ssh_mean 1 tag 218.77
7 13 AGI_0m 3 temp_mean 203.96
8 13 AGI_0m 1 tag 202.13
9 14 AGI_250m 3 temp_mean 136.10
10 12 bathy_sd 1 tag 120.44
11 15 pred_var 1 tag 112.61
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_250m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_250m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3901855 0.6688813 0.8688766 1.221388 0.2807949 0.7655197
agi_0m_60m_250m_Nspat_Ytag <- readRDS (brt_outputs[2 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3194562
Correlation 0.9142194
AUC 0.9917000
Per.Expl 76.9559147
cvDeviance 0.5170729
cvCorrelation 0.8191713
cvAUC 0.9583200
cvPer.Expl 62.7007660
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Nspat_Ytag)
rel.inf
tag 39.9057995
AGI_250m 15.9521143
bathy_mean 11.5951719
temp_mean 11.0079998
AGI_0m 3.9165535
sal_mean 2.9778595
AGI_60m 2.5421021
ssh_mean 2.3136229
chl_mean 2.1326512
vostr_mean 1.7126189
uostr_mean 1.2278622
bathy_sd 1.0302718
mld_mean 1.0176906
pred_var 1.0118376
vo_mean 0.9572175
uo_mean 0.6986267
#explore partial plots
gbm.plot (agi_0m_60m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 598.01
2 11 bathy_mean 1 tag 474.54
3 15 AGI_250m 1 tag 312.33
4 3 temp_mean 1 tag 311.72
5 2 chl_mean 1 tag 242.05
6 14 AGI_60m 1 tag 224.33
7 9 ssh_mean 1 tag 200.59
8 13 AGI_0m 1 tag 191.78
9 13 AGI_0m 3 temp_mean 179.03
10 12 bathy_sd 1 tag 122.19
11 15 AGI_250m 3 temp_mean 119.14
12 8 vostr_mean 1 tag 81.82
13 16 pred_var 1 tag 80.72
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3876588 0.6736314 0.8714891 1.21862 0.2894265 0.7695591
agi_0m_60m_250m_Yspat_Ytag <- readRDS (brt_outputs[3 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.2996914
Correlation 0.9203897
AUC 0.9931000
Per.Expl 78.3816567
cvDeviance 0.4880000
cvCorrelation 0.8312630
cvAUC 0.9631300
cvPer.Expl 64.7979495
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Yspat_Ytag)
rel.inf
tag 38.8772398
dist_coast 19.6961943
lat 9.2401730
AGI_250m 7.9838041
bathy_mean 4.2759852
temp_mean 4.1810197
AGI_0m 3.2807915
AGI_60m 2.2096760
sal_mean 2.1899170
chl_mean 2.0352351
ssh_mean 1.2407309
pred_var 1.0135341
mld_mean 0.9384425
bathy_sd 0.6260702
vo_mean 0.5836817
uostr_mean 0.5717539
vostr_mean 0.5376097
uo_mean 0.5181415
#explore partial plots
gbm.plot (agi_0m_60m_250m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Yspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 554.02
2 17 AGI_250m 1 tag 272.42
3 12 bathy_mean 1 tag 268.95
4 5 sal_mean 1 tag 205.22
5 3 chl_mean 1 tag 175.43
6 16 AGI_60m 1 tag 134.83
7 4 temp_mean 1 tag 134.10
8 15 dist_coast 1 tag 126.25
9 14 AGI_0m 1 tag 118.51
10 18 pred_var 1 tag 91.28
11 13 bathy_sd 1 tag 73.70
12 10 ssh_mean 1 tag 62.40
13 11 mld_mean 1 tag 54.65
14 8 vo_mean 1 tag 53.91
15 17 AGI_250m 2 lat 44.17
16 9 vostr_mean 1 tag 40.01
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_250m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Yspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3811484 0.68772 0.8796713 1.218843 0.3059674 0.7838166
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_crw_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
58.62
0.634
0.723
0.7550
0.948
0.3096
0.793
0.948
0.9923
0.586
base_0m_Nspat_Ytag
83.37
0.286
0.745
0.9199
0.992
0.1920
0.926
0.992
0.9940
0.834
base_0m_Yspat_Ytag
87.20
0.232
0.747
0.9360
0.995
0.1710
0.942
0.995
0.9960
0.872
do_0m_Nspat_Ytag
75.41
0.393
0.740
0.8480
0.981
0.2400
0.881
0.981
0.9980
0.754
do_0m_Yspat_Ytag
77.14
0.366
0.741
0.8580
0.984
0.2300
0.890
0.984
0.9970
0.771
do_0m_60m_Nspat_Ytag
76.37
0.382
0.749
0.8550
0.982
0.2360
0.885
0.982
0.9990
0.764
do_0m_250m_Nspat_Ytag
77.05
0.374
0.741
0.8570
0.983
0.2340
0.886
0.983
0.9980
0.770
do_0m_60m_250m_Nspat_Ytag
76.97
0.376
0.741
0.8530
0.882
0.2350
0.885
0.982
0.9990
0.770
do_0m_60m_250m_Yspat_Ytag
78.28
0.356
0.742
0.8650
0.985
0.2270
0.893
0.985
0.9980
0.783
agi_0m_Nspat_Ytag
76.24
0.377
0.741
0.8660
0.984
0.2310
0.890
0.984
1.0000
0.762
agi_0m_Yspat_Ytag
77.78
0.351
0.743
0.8780
0.986
0.2230
0.898
0.986
1.0020
0.778
agi_0m_60m_Nspat_Ytag
76.21
0.377
0.741
0.8620
0.984
0.2310
0.890
0.984
1.0000
0.762
agi_0m_250m_Nspat_Ytag
76.55
0.371
0.742
0.8650
0.984
0.2300
0.891
0.984
1.0000
0.765
agi_0m_60m_250m_Nspat_Ytag
76.96
0.366
0.742
0.8650
0.985
0.2280
0.893
0.985
1.0000
0.770
agi_0m_60m_250m_Yspat_Ytag
78.38
0.345
0.743
0.8790
0.987
0.2210
0.900
0.987
1.0000
0.784
DO models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
do_0m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[12 ])
# model performance
ggBRT:: ggPerformance (do_0m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.6208259
Correlation 0.7971732
AUC 0.9514000
Per.Expl 55.2164882
cvDeviance 0.8375316
cvCorrelation 0.6744345
cvAUC 0.8808600
cvPer.Expl 39.5843427
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Nspat_Ntag)
rel.inf
bathy_mean 31.526581
o2_mean_0m 23.264544
temp_mean 9.741202
chl_mean 6.921603
sal_mean 5.468262
ssh_mean 4.104042
mld_mean 3.594511
vostr_mean 3.464124
bathy_sd 3.007315
vo_mean 2.617281
uo_mean 2.402761
uostr_mean 2.084738
pred_var 1.803036
#explore partial plots
gbm.plot (do_0m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 595.51
2 11 bathy_mean 9 ssh_mean 309.62
3 11 bathy_mean 1 o2_mean_0m 258.95
4 6 uostr_mean 1 o2_mean_0m 238.00
5 7 vo_mean 4 sal_mean 217.39
6 9 ssh_mean 2 chl_mean 164.66
7 3 temp_mean 1 o2_mean_0m 154.42
8 9 ssh_mean 3 temp_mean 134.22
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3963781 0.6209157 0.8461656 1.11377 0.3114942 0.5521649
do_0m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[13 ])
# model performance
ggBRT:: ggPerformance (do_0m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5580518
Correlation 0.8241516
AUC 0.9634000
Per.Expl 59.7447186
cvDeviance 0.7907379
cvCorrelation 0.6970833
cvAUC 0.8943200
cvPer.Expl 42.9598231
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Yspat_Ntag)
rel.inf
dist_coast 29.217842
o2_mean_0m 16.224849
lat 11.255576
bathy_mean 9.441378
temp_mean 6.087307
chl_mean 5.456816
sal_mean 4.766777
ssh_mean 3.184874
mld_mean 2.755934
vostr_mean 2.493747
bathy_sd 1.924312
vo_mean 1.866466
pred_var 1.786529
uo_mean 1.781369
uostr_mean 1.756226
#explore partial plots
gbm.plot (do_0m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Yspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 7 uostr_mean 1 lat 264.97
2 12 bathy_mean 4 temp_mean 247.85
3 4 temp_mean 1 lat 177.44
4 8 vo_mean 5 sal_mean 143.64
5 5 sal_mean 3 chl_mean 133.48
6 5 sal_mean 1 lat 128.49
7 10 ssh_mean 4 temp_mean 93.48
8 13 bathy_sd 3 chl_mean 92.15
9 2 o2_mean_0m 1 lat 82.50
10 7 uostr_mean 2 o2_mean_0m 79.01
11 13 bathy_sd 8 vo_mean 78.63
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Yspat_Ntag, dat_test_do,
n.trees = do_0m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_Yspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3899882 0.6393687 0.8580075 1.126315 0.329067 0.5974472
do_0m_60m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[11 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5960029
Correlation 0.8077326
AUC 0.9559000
Per.Expl 57.0071067
cvDeviance 0.8245456
cvCorrelation 0.6801831
cvAUC 0.8845200
cvPer.Expl 40.5210924
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_Nspat_Ntag)
rel.inf
bathy_mean 29.474044
o2_mean_0m 20.685900
o2_mean_60m 9.480854
temp_mean 7.844178
chl_mean 6.789479
sal_mean 5.422841
ssh_mean 3.350090
mld_mean 3.315024
vostr_mean 2.806358
bathy_sd 2.456202
vo_mean 2.346608
uo_mean 2.314745
uostr_mean 1.970002
pred_var 1.743674
#explore partial plots
gbm.plot (do_0m_60m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 613.95
2 11 bathy_mean 9 ssh_mean 319.85
3 7 vo_mean 4 sal_mean 255.28
4 11 bathy_mean 1 o2_mean_0m 223.92
5 8 vostr_mean 1 o2_mean_0m 163.94
6 9 ssh_mean 2 chl_mean 141.11
7 6 uostr_mean 1 o2_mean_0m 135.14
8 13 o2_mean_60m 5 uo_mean 130.94
9 3 temp_mean 1 o2_mean_0m 114.13
10 13 o2_mean_60m 11 bathy_mean 102.47
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_60m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_60m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3953646 0.6248424 0.8477561 1.118706 0.3120818 0.5700711
do_0m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[8 ])
# model performance
ggBRT:: ggPerformance (do_0m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5951366
Correlation 0.8078475
AUC 0.9566000
Per.Expl 57.0695915
cvDeviance 0.8188375
cvCorrelation 0.6829591
cvAUC 0.8860800
cvPer.Expl 40.9328434
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_250m_Nspat_Ntag)
rel.inf
o2_mean_0m 25.778726
o2_mean_250m 23.499205
bathy_mean 14.674555
temp_mean 6.793853
sal_mean 5.567258
chl_mean 5.508642
ssh_mean 3.268859
mld_mean 2.659917
bathy_sd 2.467013
vostr_mean 2.183914
vo_mean 2.098303
uo_mean 1.892809
uostr_mean 1.861556
pred_var 1.745391
#explore partial plots
gbm.plot (do_0m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_250m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 273.93
2 6 uostr_mean 1 o2_mean_0m 230.28
3 7 vo_mean 4 sal_mean 217.00
4 11 bathy_mean 9 ssh_mean 122.51
5 11 bathy_mean 1 o2_mean_0m 114.64
6 13 o2_mean_250m 3 temp_mean 107.88
7 9 ssh_mean 2 chl_mean 100.32
8 12 bathy_sd 2 chl_mean 92.32
9 13 o2_mean_250m 1 o2_mean_0m 84.01
10 12 bathy_sd 1 o2_mean_0m 71.35
#predictive performance using test dataset
preds <- predict.gbm (do_0m_250m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_250m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.395951 0.6237522 0.8475012 1.121258 0.3118321 0.5706959
do_0m_60m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[9 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5870229
Correlation 0.8110790
AUC 0.9579000
Per.Expl 57.6548789
cvDeviance 0.8118559
cvCorrelation 0.6868417
cvAUC 0.8880400
cvPer.Expl 41.4364689
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Nspat_Ntag)
rel.inf
o2_mean_0m 25.288168
o2_mean_250m 22.833546
bathy_mean 13.619527
temp_mean 6.153364
o2_mean_60m 5.801156
chl_mean 5.075643
sal_mean 5.015392
ssh_mean 2.813548
mld_mean 2.571844
bathy_sd 2.243730
vostr_mean 1.854911
vo_mean 1.782417
uostr_mean 1.732225
uo_mean 1.678592
pred_var 1.535935
#explore partial plots
gbm.plot (do_0m_60m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Nspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 205.44
2 7 vo_mean 4 sal_mean 196.39
3 13 o2_mean_60m 5 uo_mean 139.86
4 14 o2_mean_250m 3 temp_mean 124.25
5 6 uostr_mean 1 o2_mean_0m 124.23
6 13 o2_mean_60m 4 sal_mean 97.65
7 11 bathy_mean 9 ssh_mean 94.77
8 11 bathy_mean 1 o2_mean_0m 92.90
9 13 o2_mean_60m 11 bathy_mean 89.46
10 13 o2_mean_60m 2 chl_mean 88.52
11 13 o2_mean_60m 1 o2_mean_0m 78.06
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Nspat_Ntag, dat_test_do,
n.trees = do_0m_60m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Nspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3946264 0.6271225 0.8495688 1.121056 0.3153593 0.5765488
do_0m_60m_250m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[10 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5527203
Correlation 0.8262598
AUC 0.9640000
Per.Expl 60.1293084
cvDeviance 0.7847145
cvCorrelation 0.7003898
cvAUC 0.8958800
cvPer.Expl 43.3943200
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Yspat_Ntag)
rel.inf
dist_coast 26.854489
o2_mean_0m 17.046558
o2_mean_250m 10.577546
lat 8.073813
o2_mean_60m 5.245044
temp_mean 4.966089
chl_mean 4.654707
sal_mean 4.332791
bathy_mean 4.206105
ssh_mean 2.785887
mld_mean 2.207692
bathy_sd 1.607619
pred_var 1.567828
uostr_mean 1.493819
vostr_mean 1.471591
vo_mean 1.467744
uo_mean 1.440676
#explore partial plots
gbm.plot (do_0m_60m_250m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Yspat_Ntag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 4 temp_mean 132.83
2 8 vo_mean 5 sal_mean 131.72
3 13 bathy_sd 8 vo_mean 106.08
4 16 o2_mean_250m 4 temp_mean 101.85
5 5 sal_mean 1 lat 93.91
6 7 uostr_mean 1 lat 93.37
7 4 temp_mean 1 lat 78.62
8 15 o2_mean_60m 3 chl_mean 78.16
9 10 ssh_mean 2 o2_mean_0m 70.00
10 5 sal_mean 3 chl_mean 65.17
11 4 temp_mean 2 o2_mean_0m 62.51
12 15 o2_mean_60m 6 uo_mean 61.34
13 5 sal_mean 4 temp_mean 59.83
14 10 ssh_mean 1 lat 59.29
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Yspat_Ntag, dat_test_do,
n.trees = do_0m_60m_250m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Yspat_Ntag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3898282 0.6399039 0.8581617 1.12605 0.3299834 0.6012931
AGI models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
agi_0m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[5 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6340293
Correlation 0.7921513
AUC 0.9492000
Per.Expl 54.2644239
cvDeviance 0.8422348
cvCorrelation 0.6722198
cvAUC 0.8801300
cvPer.Expl 39.2455695
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Nspat_Ntag)
rel.inf
bathy_mean 33.820563
temp_mean 15.672469
AGI_0m 13.553923
sal_mean 7.590595
chl_mean 5.893989
ssh_mean 5.078249
vostr_mean 3.672166
mld_mean 3.125983
vo_mean 2.601442
bathy_sd 2.554441
uostr_mean 2.512758
uo_mean 2.163741
pred_var 1.759683
#explore partial plots
gbm.plot (agi_0m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 1998.30
2 5 uostr_mean 2 temp_mean 484.90
3 10 bathy_mean 2 temp_mean 307.67
4 12 AGI_0m 10 bathy_mean 268.31
5 10 bathy_mean 8 ssh_mean 245.43
6 12 AGI_0m 8 ssh_mean 193.92
7 10 bathy_mean 3 sal_mean 169.83
8 10 bathy_mean 1 chl_mean 144.06
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.397247 0.6176718 0.8448034 1.110872 0.3099025 0.5426442
agi_0m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[6 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.5780131
Correlation 0.8154685
AUC 0.9598000
Per.Expl 58.3051410
cvDeviance 0.7941253
cvCorrelation 0.6953459
cvAUC 0.8929600
cvPer.Expl 42.7159357
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Yspat_Ntag)
rel.inf
dist_coast 31.192044
lat 13.232571
AGI_0m 11.143744
bathy_mean 10.268145
temp_mean 8.169900
sal_mean 5.586783
chl_mean 4.714932
ssh_mean 2.882465
vostr_mean 2.390455
mld_mean 2.379627
uostr_mean 1.674082
pred_var 1.643717
bathy_sd 1.627854
uo_mean 1.552650
vo_mean 1.541032
#explore partial plots
gbm.plot (agi_0m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Yspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 6 uostr_mean 1 lat 611.59
2 13 AGI_0m 3 temp_mean 437.86
3 3 temp_mean 1 lat 361.18
4 13 AGI_0m 1 lat 325.74
5 11 bathy_mean 3 temp_mean 166.89
6 4 sal_mean 1 lat 161.98
7 4 sal_mean 3 temp_mean 134.61
8 13 AGI_0m 11 bathy_mean 102.86
9 13 AGI_0m 9 ssh_mean 94.53
10 12 bathy_sd 4 sal_mean 64.04
11 13 AGI_0m 4 sal_mean 58.91
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Yspat_Ntag, dat_test_agi,
n.trees = agi_0m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_Yspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3902807 0.6373155 0.8559211 1.121823 0.3309482 0.5830514
agi_0m_60m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[4 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6197682
Correlation 0.7984914
AUC 0.9521000
Per.Expl 55.2931460
cvDeviance 0.8348906
cvCorrelation 0.6763212
cvAUC 0.8824700
cvPer.Expl 39.7753371
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_Nspat_Ntag)
rel.inf
bathy_mean 33.692502
temp_mean 14.934611
AGI_0m 13.455631
sal_mean 7.030234
chl_mean 5.106212
AGI_60m 4.659823
ssh_mean 4.388314
vostr_mean 3.325690
mld_mean 3.155754
vo_mean 2.331571
uostr_mean 2.294309
bathy_sd 2.116380
uo_mean 1.896765
pred_var 1.612204
#explore partial plots
gbm.plot (agi_0m_60m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 2120.20
2 5 uostr_mean 2 temp_mean 314.07
3 12 AGI_0m 10 bathy_mean 295.79
4 10 bathy_mean 8 ssh_mean 288.22
5 12 AGI_0m 3 sal_mean 224.15
6 10 bathy_mean 2 temp_mean 204.44
7 10 bathy_mean 1 chl_mean 164.66
8 10 bathy_mean 3 sal_mean 162.36
9 13 AGI_60m 3 sal_mean 151.07
10 13 AGI_60m 10 bathy_mean 125.43
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_60m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_60m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3951172 0.6232473 0.848006 1.110951 0.3167357 0.5529315
agi_0m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[1 ])
# model performance
ggBRT:: ggPerformance (agi_0m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6021139
Correlation 0.8057358
AUC 0.9558000
Per.Expl 56.5666349
cvDeviance 0.8206077
cvCorrelation 0.6822655
cvAUC 0.8856300
cvPer.Expl 40.8056357
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_250m_Nspat_Ntag)
rel.inf
AGI_250m 22.838377
temp_mean 17.247651
bathy_mean 16.982394
AGI_0m 11.714465
sal_mean 6.879946
chl_mean 4.686083
ssh_mean 3.667093
vostr_mean 3.183649
mld_mean 2.564665
uostr_mean 2.319691
bathy_sd 2.131897
vo_mean 2.042541
uo_mean 1.929522
pred_var 1.812026
#explore partial plots
gbm.plot (agi_0m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_250m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 1466.67
2 13 AGI_250m 2 temp_mean 218.76
3 12 AGI_0m 8 ssh_mean 152.09
4 12 AGI_0m 3 sal_mean 140.15
5 5 uostr_mean 2 temp_mean 132.63
6 3 sal_mean 2 temp_mean 127.78
7 10 bathy_mean 1 chl_mean 126.84
8 10 bathy_mean 8 ssh_mean 125.55
9 10 bathy_mean 3 sal_mean 109.81
10 12 AGI_0m 10 bathy_mean 108.81
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_250m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_250m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3943126 0.626913 0.8506919 1.120037 0.3174501 0.5656663
agi_0m_60m_250m_Nspat_Ntag <- readRDS (brt_outputs_Ntag[2 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Nspat_Ntag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.5667515
Correlation 0.8224811
AUC 0.9633000
Per.Expl 59.1174968
cvDeviance 0.8149061
cvCorrelation 0.6849465
cvAUC 0.8872100
cvPer.Expl 41.2169189
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Nspat_Ntag)
rel.inf
AGI_250m 21.018347
bathy_mean 17.029046
temp_mean 16.367793
AGI_0m 10.504762
sal_mean 6.417153
AGI_60m 4.812848
chl_mean 4.613102
ssh_mean 3.196548
vostr_mean 3.066020
mld_mean 2.767906
uostr_mean 2.454455
uo_mean 2.076121
bathy_sd 1.970942
vo_mean 1.946589
pred_var 1.758368
#explore partial plots
gbm.plot (agi_0m_60m_250m_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Nspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 1619.82
2 12 AGI_0m 3 sal_mean 301.89
3 14 AGI_250m 2 temp_mean 265.33
4 5 uostr_mean 2 temp_mean 169.21
5 10 bathy_mean 3 sal_mean 150.41
6 10 bathy_mean 2 temp_mean 125.60
7 10 bathy_mean 8 ssh_mean 120.88
8 12 AGI_0m 10 bathy_mean 100.88
9 10 bathy_mean 1 chl_mean 100.70
10 14 AGI_250m 13 AGI_60m 90.76
11 12 AGI_0m 8 ssh_mean 85.03
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Nspat_Ntag, dat_test_agi,
n.trees = agi_0m_60m_250m_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Nspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Nspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.393462 0.6310036 0.8514003 1.128121 0.3219123 0.591175
agi_0m_60m_250m_Yspat_Ntag <- readRDS (brt_outputs_Ntag[3 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Yspat_Ntag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.5474472
Correlation 0.8299837
AUC 0.9661000
Per.Expl 60.5100091
cvDeviance 0.7845229
cvCorrelation 0.7002350
cvAUC 0.8953900
cvPer.Expl 43.4086045
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Yspat_Ntag)
rel.inf
dist_coast 27.495072
lat 12.015818
AGI_250m 9.983021
AGI_0m 9.789768
temp_mean 7.832861
bathy_mean 5.812078
sal_mean 5.049246
chl_mean 4.305359
AGI_60m 3.697070
mld_mean 2.409316
ssh_mean 2.323457
vostr_mean 1.860367
bathy_sd 1.627986
pred_var 1.535616
uostr_mean 1.453659
uo_mean 1.441664
vo_mean 1.367642
#explore partial plots
gbm.plot (agi_0m_60m_250m_Yspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Yspat_Ntag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 528.25
2 13 AGI_0m 1 lat 389.25
3 6 uostr_mean 1 lat 308.88
4 3 temp_mean 1 lat 192.77
5 4 sal_mean 1 lat 94.77
6 13 AGI_0m 9 ssh_mean 78.08
7 13 AGI_0m 2 chl_mean 71.42
8 12 bathy_sd 4 sal_mean 66.68
9 13 AGI_0m 11 bathy_mean 65.51
10 15 AGI_60m 13 AGI_0m 62.14
11 15 AGI_60m 14 dist_coast 60.98
12 17 pred_var 15 AGI_60m 58.69
13 16 AGI_250m 15 AGI_60m 56.29
14 13 AGI_0m 5 uo_mean 52.34
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Yspat_Ntag, dat_test_agi,
n.trees = agi_0m_60m_250m_Yspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Yspat_Ntag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Yspat_Ntag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3882204 0.6438738 0.8588545 1.12942 0.3364471 0.6051001
Summary table of results
output_sum_Ntag <- read.csv (here ("data/brt/mod_outputs/brt_crw_output_summary_Ntag.csv" ))
kableExtra:: kable (output_sum_Ntag)
base_0m_Nspat_Ntag
58.620
0.634
0.723
0.755
0.948
0.3096
0.793
0.9480
0.9923
0.586
do_0m_Nspat_Ntag
55.216
0.687
0.714
0.699
0.930
0.3280
0.760
0.9300
1.0000
0.552
do_0m_Yspat_Ntag
59.744
0.631
0.720
0.727
0.942
0.3120
0.784
0.9423
1.0000
0.597
do_0m_60m_Nspat_Ntag
57.007
0.666
0.717
0.712
0.935
0.3220
0.769
0.9350
1.0000
0.570
do_0m_250m_Nspat_Ntag
57.070
0.665
0.717
0.717
0.935
0.3220
0.769
0.9350
1.0000
0.571
do_0m_60m_250m_Nspat_Ntag
57.650
0.659
0.718
0.718
0.937
0.3200
0.772
0.9370
1.0000
0.577
do_0m_60m_250m_Yspat_Ntag
60.130
0.629
0.721
0.729
0.943
0.3120
0.785
0.9420
1.0000
0.601
agi_0m_Nspat_Ntag
54.264
0.695
0.714
0.700
0.929
0.3300
0.757
0.9290
0.9970
0.543
agi_0m_Yspat_Ntag
58.310
0.638
0.720
0.729
0.942
0.3150
0.782
0.9420
0.9990
0.583
agi_0m_60m_Nspat_Ntag
55.293
0.685
0.715
0.703
0.931
0.3270
0.761
0.9310
0.9990
0.553
agi_0m_250m_Nspat_Ntag
56.600
0.663
0.718
0.714
0.937
0.3210
0.771
0.9370
0.9990
0.566
agi_0m_60m_250m_Nspat_Ntag
59.120
0.633
0.721
0.731
0.944
0.3130
0.785
0.9440
0.9990
0.591
agi_0m_60m_250m_Yspat_Ntag
60.510
0.608
0.724
0.746
0.949
0.3060
0.796
0.9490
0.9990
0.605
Base models w/o tag ID and w/ data at seasonal and annual resolutions
For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_base_0m_seas_Nspat_Ntag.rds" ,
test_data = dat_test_base_s)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862921
Residual.Deviance 0.8807219
Correlation 0.6570729
AUC 0.8771000
Per.Expl 36.4692388
cvDeviance 0.9695580
cvCorrelation 0.5944613
cvAUC 0.8396600
cvPer.Expl 30.0610607
[1] "Relative influence of predictor variables"
rel.inf
vostr_mean 15.572003
uostr_mean 13.382375
bathy_mean 13.317586
temp_mean 11.424998
vo_mean 11.376691
ssh_mean 9.814064
sal_mean 9.165748
mld_mean 5.198750
chl_mean 4.650146
uo_mean 2.770662
bathy_sd 1.797589
pred_var 1.529388
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 sal_mean 1 mld_mean 170.30
2 10 bathy_mean 6 uostr_mean 84.52
3 7 vo_mean 4 temp_mean 77.40
4 8 vostr_mean 4 temp_mean 74.31
5 10 bathy_mean 2 sal_mean 60.62
6 8 vostr_mean 2 sal_mean 51.57
7 6 uostr_mean 2 sal_mean 49.62
[1] "Calculated percent deviance"
[1] 0.9041464
[1] "TPR"
[1] 0.6830441
[1] "TSS"
[1] 0.5681
[1] "Percent deviance calculated by in house functions"
[1] 36.46924
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8500 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3850375 0.6407874 0.8673864 0.9890539 0.3475683 0.3646924
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_base_0m_ann_Nspat_Ntag.rds" ,
test_data = dat_test_base_a)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862692
Residual.Deviance 0.7349386
Correlation 0.7459693
AUC 0.9281000
Per.Expl 46.9844254
cvDeviance 0.9417038
cvCorrelation 0.6104769
cvAUC 0.8493500
cvPer.Expl 32.0691975
[1] "Relative influence of predictor variables"
rel.inf
vostr_mean 21.617278
uostr_mean 14.250460
sal_mean 10.315309
bathy_mean 9.874017
mld_mean 8.510153
chl_mean 7.160177
temp_mean 6.989894
ssh_mean 5.713556
vo_mean 5.707209
uo_mean 3.859038
bathy_sd 3.093930
pred_var 2.908980
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 ssh_mean 2 sal_mean 662.38
2 8 vostr_mean 4 temp_mean 556.71
3 8 vostr_mean 6 uostr_mean 501.98
4 6 uostr_mean 3 ssh_mean 250.67
5 6 uostr_mean 2 sal_mean 186.36
6 4 temp_mean 3 ssh_mean 178.75
7 10 bathy_mean 8 vostr_mean 178.13
[1] "Calculated percent deviance"
[1] 0.8014552
[1] "TPR"
[1] 0.7008862
[1] "TSS"
[1] 0.6315782
[1] "Percent deviance calculated by in house functions"
[1] 46.98443
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3585463 0.7019189 0.9031209 0.9931701 0.4218617 0.4698443
DO models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862918
Residual.Deviance 0.8131538
Correlation 0.6994460
AUC 0.9026000
Per.Expl 41.3432465
cvDeviance 0.9300675
cvCorrelation 0.6197794
cvAUC 0.8540500
cvPer.Expl 32.9096909
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_seas 25.152468
o2_mean_0m_seas 25.038229
temp_mean 8.349439
bathy_mean 8.224763
o2_mean_60m_seas 7.769275
sal_mean 6.060609
ssh_mean 4.248712
chl_mean 4.174670
mld_mean 2.601719
bathy_sd 1.708980
vostr_mean 1.599245
vo_mean 1.477038
uo_mean 1.363871
pred_var 1.157471
uostr_mean 1.073510
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 o2_mean_0m_seas 8 ssh_mean 145.76
2 15 o2_mean_250m_seas 13 o2_mean_0m_seas 108.53
3 10 bathy_mean 2 temp_mean 87.06
4 13 o2_mean_0m_seas 2 temp_mean 70.31
5 14 o2_mean_60m_seas 3 sal_mean 59.91
6 14 o2_mean_60m_seas 2 temp_mean 52.99
7 8 ssh_mean 3 sal_mean 49.60
8 13 o2_mean_0m_seas 1 chl_mean 44.89
9 11 bathy_sd 6 vo_mean 42.78
10 13 o2_mean_0m_seas 10 bathy_mean 38.04
11 6 vo_mean 3 sal_mean 34.64
[1] "Calculated percent deviance"
[1] 0.8354198
[1] "TPR"
[1] 0.6949432
[1] "TSS"
[1] 0.6043099
[1] "Percent deviance calculated by in house functions"
[1] 41.34325
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3676294 0.6829054 0.8912124 0.9932145 0.3973599 0.4134325
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862918
Residual.Deviance 0.7943064
Correlation 0.7096714
AUC 0.9084000
Per.Expl 42.7028024
cvDeviance 0.9108600
cvCorrelation 0.6311181
cvAUC 0.8612700
cvPer.Expl 34.2952205
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 25.1841934
o2_mean_0m_seas 18.1479440
o2_mean_250m_seas 12.5170316
temp_mean 7.7078633
o2_mean_60m_seas 6.1992179
sal_mean 6.0402895
lat 5.3149710
chl_mean 3.8851264
bathy_mean 3.6556887
ssh_mean 2.9063749
mld_mean 2.1701965
vostr_mean 1.3936376
vo_mean 1.0712342
pred_var 1.0113083
bathy_sd 1.0112493
uostr_mean 0.9104869
uo_mean 0.8731865
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 134.39
2 15 o2_mean_0m_seas 9 ssh_mean 125.42
3 17 o2_mean_250m_seas 4 sal_mean 73.23
4 15 o2_mean_0m_seas 3 temp_mean 52.57
5 17 o2_mean_250m_seas 15 o2_mean_0m_seas 46.03
6 15 o2_mean_0m_seas 2 chl_mean 35.95
7 15 o2_mean_0m_seas 1 lat 32.66
8 9 ssh_mean 4 sal_mean 31.50
9 4 sal_mean 1 lat 31.02
10 14 pred_var 5 uo_mean 30.33
11 3 temp_mean 1 lat 28.85
12 16 o2_mean_60m_seas 3 temp_mean 26.24
13 12 bathy_sd 7 vo_mean 26.07
14 16 o2_mean_60m_seas 4 sal_mean 25.70
[1] "Calculated percent deviance"
[1] 0.8183304
[1] "TPR"
[1] 0.6976123
[1] "TSS"
[1] 0.6129695
[1] "Percent deviance calculated by in house functions"
[1] 42.7028
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3636406 0.6915222 0.896574 0.9916489 0.4096875 0.427028
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862918
Residual.Deviance 0.7176601
Correlation 0.7584527
AUC 0.9357000
Per.Expl 48.2316706
cvDeviance 0.9471687
cvCorrelation 0.6067520
cvAUC 0.8474300
cvPer.Expl 31.6760950
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_ann 23.641945
temp_mean 12.889831
o2_mean_0m_ann 10.482357
o2_mean_60m_ann 8.299163
bathy_mean 8.044860
sal_mean 6.948934
chl_mean 6.713889
ssh_mean 4.633439
mld_mean 3.062796
vostr_mean 3.051127
bathy_sd 3.037426
uo_mean 2.433320
vo_mean 2.391635
pred_var 2.219267
uostr_mean 2.150013
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_ann 2 temp_mean 289.05
2 13 o2_mean_0m_ann 2 temp_mean 220.91
3 14 o2_mean_60m_ann 3 sal_mean 182.87
4 6 vo_mean 3 sal_mean 120.76
5 8 ssh_mean 3 sal_mean 114.99
6 3 sal_mean 2 temp_mean 103.12
7 13 o2_mean_0m_ann 1 chl_mean 102.61
8 10 bathy_mean 2 temp_mean 87.04
9 14 o2_mean_60m_ann 10 bathy_mean 77.37
10 7 vostr_mean 2 temp_mean 65.93
11 2 temp_mean 1 chl_mean 64.65
[1] "Calculated percent deviance"
[1] 0.7843388
[1] "TPR"
[1] 0.7037686
[1] "TSS"
[1] 0.6405072
[1] "Percent deviance calculated by in house functions"
[1] 48.23167
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3541984 0.7117905 0.9089163 0.9935829 0.4342078 0.4823167
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862918
Residual.Deviance 0.6798841
Correlation 0.7782285
AUC 0.9456000
Per.Expl 50.9566399
cvDeviance 0.9261329
cvCorrelation 0.6194279
cvAUC 0.8549100
cvPer.Expl 33.1935140
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 22.057080
o2_mean_250m_ann 11.773782
temp_mean 9.550491
lat 8.313899
sal_mean 7.157929
chl_mean 6.201912
o2_mean_60m_ann 5.734171
o2_mean_0m_ann 5.407617
bathy_mean 4.364136
ssh_mean 3.518572
mld_mean 2.878000
vostr_mean 2.455810
pred_var 2.413791
vo_mean 2.145230
uo_mean 2.082053
bathy_sd 2.006137
uostr_mean 1.939390
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 16 o2_mean_60m_ann 3 temp_mean 212.58
2 16 o2_mean_60m_ann 13 dist_coast 138.39
3 7 vo_mean 4 sal_mean 95.23
4 3 temp_mean 2 chl_mean 94.12
5 13 dist_coast 4 sal_mean 93.00
6 15 o2_mean_0m_ann 3 temp_mean 84.34
7 12 bathy_sd 7 vo_mean 83.82
8 14 pred_var 5 uo_mean 76.36
9 17 o2_mean_250m_ann 9 ssh_mean 73.04
10 16 o2_mean_60m_ann 11 bathy_mean 70.61
11 3 temp_mean 1 lat 67.02
12 6 uostr_mean 1 lat 66.88
13 16 o2_mean_60m_ann 4 sal_mean 64.38
14 16 o2_mean_60m_ann 1 lat 56.92
[1] "Calculated percent deviance"
[1] 0.7561192
[1] "TPR"
[1] 0.7075701
[1] "TSS"
[1] 0.657902
[1] "Percent deviance calculated by in house functions"
[1] 50.95664
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3468104 0.72634 0.9165027 0.9934549 0.4545643 0.5095664
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862918
Residual.Deviance 0.6142689
Correlation 0.8049682
AUC 0.9558000
Per.Expl 55.6897832
cvDeviance 0.8616534
cvCorrelation 0.6590420
cvAUC 0.8773000
cvPer.Expl 37.8447355
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_ann 17.508228
o2_mean_0m 16.052381
o2_mean_0m_seas 9.525396
temp_mean 6.200544
bathy_mean 5.333500
o2_mean_60m_ann 4.994221
o2_mean_250m_seas 4.891444
o2_mean_60m_seas 4.548159
sal_mean 4.407086
chl_mean 3.955977
o2_mean_0m_ann 2.818546
o2_mean_60m 2.730506
o2_mean_250m 2.565295
ssh_mean 2.552316
mld_mean 2.348539
bathy_sd 1.760717
vostr_mean 1.668218
pred_var 1.649356
vo_mean 1.592859
uo_mean 1.552592
uostr_mean 1.344119
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 231.00
2 13 o2_mean_60m 11 bathy_mean 100.49
3 20 o2_mean_60m_ann 11 bathy_mean 98.72
4 9 ssh_mean 4 sal_mean 95.80
5 15 pred_var 5 uo_mean 93.47
6 16 o2_mean_0m_seas 9 ssh_mean 73.81
7 20 o2_mean_60m_ann 3 temp_mean 64.13
8 16 o2_mean_0m_seas 1 o2_mean_0m 52.64
9 5 uo_mean 3 temp_mean 50.48
10 18 o2_mean_250m_seas 14 o2_mean_250m 43.25
11 13 o2_mean_60m 3 temp_mean 41.61
12 11 bathy_mean 3 temp_mean 40.83
13 7 vo_mean 4 sal_mean 40.52
14 16 o2_mean_0m_seas 2 chl_mean 40.00
15 8 vostr_mean 4 sal_mean 38.58
16 17 o2_mean_60m_seas 4 sal_mean 38.16
17 19 o2_mean_0m_ann 3 temp_mean 34.41
18 8 vostr_mean 3 temp_mean 34.18
19 18 o2_mean_250m_seas 16 o2_mean_0m_seas 33.85
20 4 sal_mean 3 temp_mean 33.18
21 14 o2_mean_250m 11 bathy_mean 32.85
22 19 o2_mean_0m_ann 16 o2_mean_0m_seas 29.31
[1] "Calculated percent deviance"
[1] 0.6793058
[1] "TPR"
[1] 0.7166256
[1] "TSS"
[1] 0.7026526
[1] "Percent deviance calculated by in house functions"
[1] 55.68978
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3251921 0.7649245 0.9346237 0.9966431 0.5099746 0.5568978
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = dat_test_do_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862918
Residual.Deviance 0.6040844
Correlation 0.8098984
AUC 0.9584000
Per.Expl 56.4244434
cvDeviance 0.8463090
cvCorrelation 0.6667445
cvAUC 0.8814800
cvPer.Expl 38.9515988
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 18.347823
o2_mean_0m 13.012223
o2_mean_250m_ann 9.919182
o2_mean_0m_seas 7.879467
temp_mean 5.776584
sal_mean 4.488389
o2_mean_60m_seas 4.088402
lat 4.082625
o2_mean_60m_ann 3.740902
chl_mean 3.235370
bathy_mean 2.900908
o2_mean_250m_seas 2.609905
o2_mean_0m_ann 2.464034
o2_mean_250m 2.360227
o2_mean_60m 2.330307
mld_mean 2.149688
ssh_mean 2.035443
vostr_mean 1.604121
vo_mean 1.522583
pred_var 1.488164
uo_mean 1.347736
bathy_sd 1.317520
uostr_mean 1.298396
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 250.91
2 22 o2_mean_60m_ann 14 dist_coast 94.71
3 10 ssh_mean 5 sal_mean 85.95
4 20 o2_mean_250m_seas 16 o2_mean_250m 79.14
5 17 pred_var 6 uo_mean 57.74
6 18 o2_mean_0m_seas 10 ssh_mean 57.35
7 12 bathy_mean 4 temp_mean 52.46
8 18 o2_mean_0m_seas 2 o2_mean_0m 49.37
9 4 temp_mean 1 lat 47.92
10 9 vostr_mean 1 lat 41.09
11 22 o2_mean_60m_ann 12 bathy_mean 39.78
12 5 sal_mean 1 lat 36.03
13 20 o2_mean_250m_seas 5 sal_mean 34.76
14 15 o2_mean_60m 4 temp_mean 34.27
15 21 o2_mean_0m_ann 4 temp_mean 31.85
16 14 dist_coast 11 mld_mean 30.44
17 18 o2_mean_0m_seas 3 chl_mean 30.09
18 7 uostr_mean 1 lat 29.21
19 9 vostr_mean 5 sal_mean 28.85
20 18 o2_mean_0m_seas 5 sal_mean 28.79
21 21 o2_mean_0m_ann 18 o2_mean_0m_seas 28.15
22 15 o2_mean_60m 14 dist_coast 28.14
23 2 o2_mean_0m 1 lat 27.99
24 15 o2_mean_60m 12 bathy_mean 26.08
25 22 o2_mean_60m_ann 1 lat 25.67
26 8 vo_mean 1 lat 24.65
[1] "Calculated percent deviance"
[1] 0.6693636
[1] "TPR"
[1] 0.7177252
[1] "TSS"
[1] 0.7108199
[1] "Percent deviance calculated by in house functions"
[1] 56.42444
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3226575 0.7693831 0.9368542 0.9956468 0.5171465 0.5642444
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_crw_do_seas_ann_output_summary.csv" ))
kableExtra:: kable (output_sum)
brt_base_0m_seas_Nspat_Ntag
36.469
0.904
0.683
0.568
0.867
0.385
0.6641
0.867
0.989
0.364
brt_base_0m_ann_Nspat_Ntag
46.984
0.801
0.701
0.632
0.903
0.359
0.7020
0.903
0.993
0.470
brt_do_0m_60m_250m_seas_Nspat_Ntag
41.343
0.835
0.695
0.604
0.891
0.368
0.6830
0.891
0.993
0.413
brt_do_0m_60m_250m_seas_Yspat_Ntag
42.703
0.818
0.697
0.612
0.897
0.363
0.6920
0.897
0.992
0.427
brt_do_0m_60m_250m_ann_Nspat_Ntag
48.232
0.784
0.704
0.641
0.909
0.354
0.7120
0.909
0.994
0.482
brt_do_0m_60m_250m_ann_Yspat_Ntag
50.957
0.756
0.708
0.658
0.917
0.347
0.7260
0.917
0.993
0.510
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag
55.689
0.679
0.717
0.703
0.935
0.325
0.7650
0.935
0.997
0.557
brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag
56.424
0.669
0.718
0.711
0.937
0.323
0.7690
0.937
0.996
0.564
AGI models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.8268163
Correlation 0.6936130
AUC 0.8986000
Per.Expl 40.3578088
cvDeviance 0.9406789
cvCorrelation 0.6142010
cvAUC 0.8511600
cvPer.Expl 32.1443594
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_seas 22.495151
temp_mean 18.833854
bathy_mean 12.566968
AGI_0m_seas 10.609661
sal_mean 7.724491
AGI_60m_seas 6.588969
chl_mean 4.284478
ssh_mean 3.966623
mld_mean 3.261915
vostr_mean 1.910101
vo_mean 1.882665
bathy_sd 1.880424
uostr_mean 1.532605
uo_mean 1.431103
pred_var 1.030994
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m_seas 2 temp_mean 179.11
2 14 AGI_60m_seas 2 temp_mean 153.23
3 15 AGI_250m_seas 2 temp_mean 99.73
4 10 bathy_mean 2 temp_mean 86.34
5 13 AGI_0m_seas 9 mld_mean 56.60
6 15 AGI_250m_seas 3 sal_mean 56.07
7 6 vo_mean 3 sal_mean 47.39
8 3 sal_mean 2 temp_mean 45.73
9 8 ssh_mean 7 vostr_mean 35.35
10 13 AGI_0m_seas 8 ssh_mean 33.50
11 15 AGI_250m_seas 8 ssh_mean 29.20
[1] "Calculated percent deviance"
[1] 0.8573576
[1] "TPR"
[1] 0.6919068
[1] "TSS"
[1] 0.591201
[1] "Percent deviance calculated by in house functions"
[1] 40.35781
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3732395 0.6706646 0.8851429 1.004349 0.3815426 0.4035781
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.8046600
Correlation 0.7051691
AUC 0.9051000
Per.Expl 41.9560507
cvDeviance 0.9214250
cvCorrelation 0.6256203
cvAUC 0.8579300
cvPer.Expl 33.5332361
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 28.6804740
temp_mean 10.5713934
AGI_250m_seas 9.9822707
AGI_0m_seas 9.7909657
lat 9.4561834
sal_mean 6.9826658
bathy_mean 4.3401161
AGI_60m_seas 4.1728652
chl_mean 4.1354833
ssh_mean 2.9829684
mld_mean 2.4380740
vostr_mean 1.2223646
vo_mean 1.1839411
bathy_sd 1.1201075
uo_mean 1.0086115
uostr_mean 0.9777738
pred_var 0.9537415
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 114.29
2 16 AGI_60m_seas 3 temp_mean 102.24
3 17 AGI_250m_seas 4 sal_mean 81.86
4 15 AGI_0m_seas 3 temp_mean 77.40
5 4 sal_mean 3 temp_mean 59.33
6 15 AGI_0m_seas 10 mld_mean 48.89
7 3 temp_mean 2 chl_mean 46.44
8 4 sal_mean 1 lat 41.65
9 3 temp_mean 1 lat 36.48
10 13 dist_coast 4 sal_mean 26.50
11 8 vostr_mean 3 temp_mean 26.34
12 15 AGI_0m_seas 9 ssh_mean 25.43
13 13 dist_coast 3 temp_mean 23.53
14 15 AGI_0m_seas 2 chl_mean 20.08
[1] "Calculated percent deviance"
[1] 0.8361803
[1] "TPR"
[1] 0.695381
[1] "TSS"
[1] 0.6086011
[1] "Percent deviance calculated by in house functions"
[1] 41.95605
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3681478 0.6819723 0.8921047 1.003881 0.3968189 0.4195605
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.6959050
Correlation 0.7712113
AUC 0.9420000
Per.Expl 49.8010643
cvDeviance 0.9441861
cvCorrelation 0.6097757
cvAUC 0.8486600
cvPer.Expl 31.8913704
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 22.406041
temp_mean 17.235664
bathy_mean 8.397350
sal_mean 8.298281
AGI_60m_ann 7.666685
chl_mean 6.325482
AGI_0m_ann 5.276719
ssh_mean 4.597771
mld_mean 3.773413
bathy_sd 3.160114
vostr_mean 2.932974
vo_mean 2.798101
uo_mean 2.498186
uostr_mean 2.416789
pred_var 2.216428
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 sal_mean 2 temp_mean 154.17
2 15 AGI_250m_ann 2 temp_mean 151.55
3 2 temp_mean 1 chl_mean 135.85
4 15 AGI_250m_ann 13 AGI_0m_ann 132.98
5 15 AGI_250m_ann 14 AGI_60m_ann 104.14
6 15 AGI_250m_ann 3 sal_mean 86.20
7 14 AGI_60m_ann 8 ssh_mean 65.73
8 8 ssh_mean 1 chl_mean 58.47
9 7 vostr_mean 2 temp_mean 57.14
10 14 AGI_60m_ann 10 bathy_mean 52.37
11 10 bathy_mean 2 temp_mean 50.80
[1] "Calculated percent deviance"
[1] 0.7649684
[1] "TPR"
[1] 0.7078821
[1] "TSS"
[1] 0.6642425
[1] "Percent deviance calculated by in house functions"
[1] 49.80106
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5100 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3476602 0.7264609 0.9171669 1.001948 0.4481878 0.4980106
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.6705049
Correlation 0.7830331
AUC 0.9474000
Per.Expl 51.6332921
cvDeviance 0.9180075
cvCorrelation 0.6254329
cvAUC 0.8580900
cvPer.Expl 33.7797521
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 22.826936
temp_mean 10.590660
AGI_250m_ann 10.555821
lat 9.964422
sal_mean 7.218453
chl_mean 5.728103
AGI_60m_ann 5.505228
AGI_0m_ann 4.845873
bathy_mean 4.141557
ssh_mean 3.669272
mld_mean 3.011325
vo_mean 2.074831
pred_var 2.072059
vostr_mean 2.022348
uo_mean 1.952451
bathy_sd 1.920724
uostr_mean 1.899938
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 lat 141.89
2 4 sal_mean 3 temp_mean 137.65
3 3 temp_mean 2 chl_mean 136.75
4 16 AGI_60m_ann 1 lat 97.93
5 7 vo_mean 4 sal_mean 97.39
6 13 dist_coast 3 temp_mean 88.57
7 11 bathy_mean 9 ssh_mean 79.33
8 4 sal_mean 1 lat 77.73
9 16 AGI_60m_ann 11 bathy_mean 71.48
10 11 bathy_mean 3 temp_mean 69.82
11 17 AGI_250m_ann 13 dist_coast 64.01
12 13 dist_coast 4 sal_mean 59.92
13 16 AGI_60m_ann 13 dist_coast 56.36
14 9 ssh_mean 8 vostr_mean 52.48
[1] "Calculated percent deviance"
[1] 0.7410847
[1] "TPR"
[1] 0.7110043
[1] "TSS"
[1] 0.6766013
[1] "Percent deviance calculated by in house functions"
[1] 51.63329
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3412925 0.7386613 0.9234047 1.002404 0.4654164 0.5163329
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.5710221
Correlation 0.8264079
AUC 0.9654000
Per.Expl 58.8094597
cvDeviance 0.8440518
cvCorrelation 0.6685080
cvAUC 0.8819900
cvPer.Expl 39.1145274
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 13.001178
temp_mean 12.830749
AGI_0m 10.918778
bathy_mean 7.845397
AGI_0m_seas 6.952762
sal_mean 5.272075
AGI_60m_ann 4.988499
AGI_250m_seas 4.690531
AGI_60m_seas 4.217959
AGI_0m_ann 3.771886
ssh_mean 3.665159
AGI_250m 3.494551
chl_mean 3.068918
mld_mean 2.540812
vostr_mean 2.043824
vo_mean 1.961488
bathy_sd 1.907967
AGI_60m 1.892487
uostr_mean 1.739288
uo_mean 1.683661
pred_var 1.512032
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 1616.55
2 20 AGI_60m_ann 16 AGI_0m_seas 195.59
3 16 AGI_0m_seas 14 AGI_250m 149.01
4 12 AGI_0m 3 sal_mean 134.56
5 12 AGI_0m 10 bathy_mean 132.39
6 19 AGI_0m_ann 16 AGI_0m_seas 128.34
7 12 AGI_0m 8 ssh_mean 110.96
8 7 vostr_mean 2 temp_mean 95.25
9 13 AGI_60m 10 bathy_mean 67.72
10 16 AGI_0m_seas 2 temp_mean 56.94
11 20 AGI_60m_ann 8 ssh_mean 53.36
12 8 ssh_mean 7 vostr_mean 50.54
13 16 AGI_0m_seas 9 mld_mean 47.71
14 18 AGI_250m_seas 2 temp_mean 47.00
15 20 AGI_60m_ann 18 AGI_250m_seas 46.84
16 19 AGI_0m_ann 14 AGI_250m 45.94
17 18 AGI_250m_seas 14 AGI_250m 43.14
18 18 AGI_250m_seas 12 AGI_0m 40.10
19 16 AGI_0m_seas 8 ssh_mean 38.33
20 17 AGI_60m_seas 12 AGI_0m 37.36
21 16 AGI_0m_seas 13 AGI_60m 35.01
22 20 AGI_60m_ann 2 temp_mean 34.67
[1] "Calculated percent deviance"
[1] 0.6504556
[1] "TPR"
[1] 0.7204943
[1] "TSS"
[1] 0.7204452
[1] "Percent deviance calculated by in house functions"
[1] 58.80946
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3171731 0.779098 0.9424043 1.003772 0.5307919 0.5880946
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = dat_test_agi_all)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.5657433
Correlation 0.8276641
AUC 0.9660000
Per.Expl 59.1902464
cvDeviance 0.8319448
cvCorrelation 0.6737772
cvAUC 0.8851700
cvPer.Expl 39.9878669
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 19.509438
AGI_0m 10.566285
lat 7.347935
AGI_0m_seas 6.737591
AGI_250m_ann 6.703165
temp_mean 6.651113
sal_mean 4.954255
AGI_60m_ann 3.872629
AGI_0m_ann 3.357691
bathy_mean 3.343414
AGI_60m_seas 3.305459
chl_mean 3.272516
AGI_250m 3.067225
ssh_mean 2.910558
AGI_250m_seas 2.513152
mld_mean 2.118438
AGI_60m 1.595637
vo_mean 1.438415
vostr_mean 1.413769
uostr_mean 1.378996
pred_var 1.373796
uo_mean 1.327595
bathy_sd 1.240930
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 908.36
2 3 temp_mean 1 lat 455.29
3 22 AGI_60m_ann 18 AGI_0m_seas 227.43
4 18 AGI_0m_seas 16 AGI_250m 203.39
5 13 AGI_0m 11 bathy_mean 103.94
6 13 AGI_0m 9 ssh_mean 101.83
7 21 AGI_0m_ann 18 AGI_0m_seas 87.16
8 13 AGI_0m 4 sal_mean 78.28
9 20 AGI_250m_seas 16 AGI_250m 64.62
10 6 uostr_mean 1 lat 64.42
11 20 AGI_250m_seas 4 sal_mean 60.21
12 8 vostr_mean 3 temp_mean 54.88
13 15 AGI_60m 14 dist_coast 54.37
14 15 AGI_60m 11 bathy_mean 50.84
15 22 AGI_60m_ann 1 lat 48.24
16 13 AGI_0m 1 lat 36.82
17 22 AGI_60m_ann 9 ssh_mean 36.48
18 4 sal_mean 1 lat 36.04
19 4 sal_mean 3 temp_mean 34.42
20 20 AGI_250m_seas 13 AGI_0m 32.95
21 21 AGI_0m_ann 1 lat 32.78
22 10 mld_mean 8 vostr_mean 30.54
23 18 AGI_0m_seas 9 ssh_mean 30.43
24 18 AGI_0m_seas 10 mld_mean 26.08
25 22 AGI_60m_ann 2 chl_mean 25.21
26 22 AGI_60m_ann 20 AGI_250m_seas 24.46
[1] "Calculated percent deviance"
[1] 0.6396553
[1] "TPR"
[1] 0.7215425
[1] "TSS"
[1] 0.7290931
[1] "Percent deviance calculated by in house functions"
[1] 59.19025
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3142732 0.7838679 0.9445091 1.003898 0.5385828 0.5919025
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_crw_agi_seas_ann_output_summary.csv" ))
kableExtra:: kable (output_sum)
brt_base_0m_seas_Nspat_Ntag
36.469
0.904
0.683
0.568
0.867
0.385
0.6641
0.867
0.989
0.364
brt_base_0m_ann_Nspat_Ntag
46.984
0.801
0.701
0.632
0.903
0.359
0.7020
0.903
0.993
0.470
brt_agi_0m_60m_250m_seas_Nspat_Ntag
40.358
0.857
0.692
0.591
0.885
0.373
0.6710
0.885
1.004
0.404
brt_agi_0m_60m_250m_seas_Yspat_Ntag
41.956
0.836
0.695
0.609
0.892
0.368
0.6820
0.892
1.003
0.420
brt_agi_0m_60m_250m_ann_Nspat_Ntag
49.801
0.765
0.708
0.664
0.917
0.347
0.7260
0.917
1.000
0.498
brt_agi_0m_60m_250m_ann_Yspat_Ntag
51.633
0.741
0.711
0.677
0.923
0.341
0.7390
0.923
1.000
0.516
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag
58.809
0.650
0.720
0.720
0.942
0.317
0.7790
0.942
1.000
0.588
brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag
59.190
0.640
0.722
0.729
0.945
0.314
0.7840
0.945
1.003
0.591